Privacy-Preserving Classification of Customer Data without Loss of Accuracy

نویسندگان

  • Zhiqiang Yang
  • Sheng Zhong
  • Rebecca N. Wright
چکیده

Privacy has become an increasingly important issue in data mining. In this paper, we consider a scenario in which a data miner surveys a large number of customers to learn classification rules on their data, while the sensitive attributes of these customers need to be protected. Solutions have been proposed to address this problem using randomization techniques. Such solutions exhibit a tradeoff of accuracy and privacy: the more each customer’s private information is protected, the less accurate result the miner obtains; conversely, the more accurate the result, the less privacy for the customers. In this paper, we propose a simple cryptographic approach that is efficient even in a many-customer setting, provides strong privacy for each customer, and does not lose any accuracy as the cost of privacy. Our key technical contribution is a privacy-preserving method that allows a data miner to compute frequencies of values or tuples of values in the customers’ data, without revealing the privacy-sensitive part of the data. Unlike general-purpose cryptographic protocols, this method requires no interaction between customers, and each customer only needs to send a single flow of communication to the data miner. However, we are still able to ensure that nothing about the sensitive data beyond the desired frequencies is revealed to the data miner. To illustrate the power of our approach, we use our frequency mining computation to obtain a privacypreserving naive Bayes classifier learning algorithm. Initial experimental results demonstrate the practical efficiency of our solution. We also suggest some other applications of privacy-preserving frequency mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک روش جدید انتشار داده‌ها با حفظ محرمانگی با هدف بهبود دقّت طبقه‌‌بندی روی داده‌های گمنام

Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...

متن کامل

Privacy-preserving Distributed Analytics: Addressing the Privacy-Utility Tradeoff Using Homomorphic Encryption for Peer-to-Peer Analytics

Data is becoming increasingly valuable, but concerns over its security and privacy have limited its utility in analytics. Researchers and practitioners are constantly facing a privacy-utility tradeoff where addressing the former is often at the cost of the data utility and accuracy. In this paper, we draw upon mathematical properties of partially homomorphic encryption, a form of asymmetric key...

متن کامل

Differentially Private Local Electricity Markets

Privacy-preserving electricity markets have a key role in steering customers towards participation in local electricity markets by guarantying to protect their sensitive information. Moreover, these markets make it possible to statically release and share the market outputs for social good. This paper aims to design a market for local energy communities by implementing Differential Privacy (DP)...

متن کامل

Privacy-Preserving Evaluation of Generalization Error and Its Application to Model and Attribute Selection

Privacy-preserving classification is the task of learning or training a classifier on the union of privately distributed datasets without sharing the datasets. The emphasis of existing studies in privacy-preserving classification has primarily been put on the design of privacy-preserving versions of particular data mining algorithms, However, in classification problems, preprocessing and postpr...

متن کامل

T Evaluating the Classification Accuracy of Data Mining Algorithms for Anonymized Data

Recent advances in hardware technology have increased storage and recording capability with regard to personal data on individuals. This has created fears that such data could be misused. To alleviate such concerns, data was anonymized and many techniques were recently proposed on performing data mining tasks in ways which ensured privacy. Anonymization techniques were drawn from a variety of r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005